This tutorial has been adapted for NeurocogII 2020 from several sources including: RNA velocity with kallisto | bus and velocyto.R Monocle3 vignettes
First we will install any needed software packages. This may take a while the first time through. By default these code chunks are not evaluated, you will have to change eval=TRUE to force them to install.
# Install devtools if it's not already installed
if (!require(devtools)) {
install.packages("devtools")
}
# Install from GitHub
devtools::install_github("mannau/h5")
devtools::install_github("BUStools/BUSpaRse")
devtools::install_github("satijalab/seurat-wrappers")
devtools::install_github("LTLA/SingleR")
devtools::install_github('linxihui/NNLM')
devtools::install_github('cvarrichio/Matrix.utils')
if (!require(BiocManager)) {
install.packages("BiocManager")
BiocManager::install(c("DropletUtils", "BSgenome.Mmusculus.UCSC.mm10",
"AnnotationHub", "SingleR","celldex","biomaRt","tidyverse","scales","zeallot","densityClust"))
}
if (!require(monocle3)) {
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment', 'batchelor','fgsea'))
devtools::install_github('cole-trapnell-lab/leidenbase') # clustering package dependency for monocle3
devtools::install_github('cole-trapnell-lab/monocle3')
devtools::install_github('scfurl/m3addon') # Monocle3 addon functions from scfurl (optionally included to illustrate high-variance gene identification)
}
For velocity applications via velocyto.R you will need to have boost (e.g. sudo apt-get install libboost-dev) and openmp libraries installed. Instructions for this are not provided in the tutorial. If you cannot get velocyto.R to install, please skip this section.
if (!require(velocyto.R)) {
BiocManager::install(c("pcaMethods"))
devtools::install_github("velocyto-team/velocyto.R")
}
Once we have installed the necessary packages, we can load them into the R session to begin our analysis.
library(AnnotationHub)
library(biomaRt)
library(BSgenome.Mmusculus.UCSC.mm10)
library(tidyverse)
library(velocyto.R)
library(BUSpaRse)
library(scales)
library(zeallot)
library(DropletUtils)
library(SingleR)
library(celldex)
library(NNLM)
library(monocle3)
theme_set(theme_bw()) # Setting a more pleasing default theme
• Understand the basic steps of single cell RNA-Sequencing analysis workflows • Develop a baseline appreciation for cellular heterogeneity both between and within cell types • Identify cell state transitions via pseudotime and RNA Velocity analysis and interpret meanings in an embedding
The dataset we are using is 10x 10k neurons from an E18 mouse (This is a large dataset ~25Gb).
Cells for this sample are from a combined cortex, hippocampus and sub ventricular zone of an E18 mouse. - 11,843 cells detected - Sequenced on Illumina NovaSeq with approximately 30,000 reads per cell - 28bp read1 (16bp Chromium barcode and 12bp UMI), 91bp read2 (transcript), and 8bp I7 sample barcode
10x Genomics sequencing workflow overview
Experimental questions: - What types of cells do we expect to find? - What/how many cellular ‘states’ do we observe at the transcriptional level? - How well defined are different cell types * What do ‘transitioning’ cell types/states look like? - Can we identify differentially expressed and/or marker genes between cell types? - What genes change expression over the course of neuronal development?
First, we will need to download the dataset, unless it has already been downloaded for you.
# Download data
# If error, please make sure that data directory exists off of project root (e.g. dir.create('data'))
if (!file.exists("./data/neuron_10k_v3_fastqs.tar")) {
download.file("http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/3.0.0/neuron_10k_v3/neuron_10k_v3_fastqs.tar", "./data/neuron_10k_v3_fastqs.tar", method = "wget", quiet = TRUE)
}
After downloading, we need to expand (uncompress) the files
cd ./data
tar -xvf ./neuron_10k_v3_fastqs.tar
Since we are downloading the raw sequencing reads for this single cell project, we will first have to ‘preprocess’ the data. In this step, the reads will be 1) mapped to a reference transcriptome to determine from which gene they are derived, 2) ‘demultiplexed’ or assigned to specific cells based on the presence of a cell-specific barcode sequence, and 3) duplicate reads from the same cell:gene fragment will be collapsed using a ‘unique molecular identifier’ (UMI).
This notebook demonstrates the use of command line tools kallisto and bustools. Please use kallisto >= 0.46, whose binary can be downloaded here. Also, please use bustools >= 0.39.3, whose binary of bustools can be found here. User interface of bustools has changed in version 0.39.3. For version 0.39.2, see earlier git commits of this notebook.
After you download the binary, you should decompress the file (if it is tar.gz) with tar -xzvf file.tar.gz in the bash terminal, and add the directory containing the binary to PATH by export PATH=$PATH:/foo/bar, where /foo/bar is the directory of interest. Then you can directly invoke the binary on the command line as we will do in this notebook.
In order to map the read sequences to their cognate genes/transcripts, we first must download and prepare a reference transcriptome. Here we are using the mouse reference transcriptome from Ensembl version 97. We download this dataset using a standard Bioconductor ‘AnnotationHub’ workflow.
# query AnnotationHub for mouse Ensembl annotation
ah <- AnnotationHub() # Connect to the Annotation Hub to query.
## snapshotDate(): 2020-10-27
record<-query(ah,pattern = c("Ensembl", "97", "Mus musculus", "EnsDb")) # Identify the AnnotationHub record associated with Mouse Ensembl v97
Once we’ve identified the AnnotationHub record id AH73905 for Mouse Ensembl v97, we can use this to retrieve all of the annotation records for this build.
# Get mouse Ensembl 97 annotation
edb <- ah[[names(record)]] # I need to comment this out to get the material to build currently...Please uncomment if you are running yourself.
## loading from cache
## require("ensembldb")
As part of this exercise, we will be performing an RNA Velocity analysis (detailed below). This requires us to generate two seperate count matrices, one for reads mapping to ‘spliced’ transcripts and one for reads mapping to ‘unspliced’ transcripts. To do this dual mapping, we need to extract the information that we need to separately map the reads to cDNA sequences as well as intronic sequences. The BUSpaRse library has a function get_velocity_files() that will extract and process the necessary information from an AnnotationHub object.
Note: While this approach generates both the spliced and unspliced annotation records, for most of the downstream analysis we will just be using the count matrices associated with the spliced read mappings.
get_velocity_files(edb, L = 91, Genome = BSgenome.Mmusculus.UCSC.mm10,
out_path = "./output/neuron10k_velocity",
isoform_action = "separate")
This has created all the necessary annotation files and dumped them into the directory ./output/neuron10k_velocity
Once we have the files for the cDNA sequences and the intronic sequences, we need to construct a reference kallisto index for the mapping. This indexing scans through the reference transcriptome and organizes it in a way that makes mapping the read sequences faster and more efficient.
# Intron index
kallisto index -i ./output/mm_cDNA_introns_97.idx ./output/neuron10k_velocity/cDNA_introns.fa
kb wrapperWith kallisto and bustools, it takes several commands to go from fastq files to the spliced and unspliced matrices, which is quite cumbersome. So a wrapper called kb was written to condense those steps to one. The command line tool kb can be installed with
conda activate kallistobus
pip install kb-python
Once you have this installed, we can then we can use the following command to generate the spliced and unspliced count matrices from the raw reads and the assembled reference index. The count matrix files will be the primary data sources for the downstream/secondary analysis.
Example \(genes X cells\) matrix of Unique Molecular Identifier (UMI) counts:
| cell1 | cell2 | cell3 | |
|---|---|---|---|
| gene1 | 0 | 2 | 0 |
| gene2 | 15 | 7 | 3 |
| gene3 | 1 | 0 | 2 |
…
cd ./output/neuron10k_velocity
kb count -i ../mm_cDNA_introns_97.idx -g tr2g.tsv -x 10xv3 -o kb \
-c1 cDNA_tx_to_capture.txt -c2 introns_tx_to_capture.txt --lamanno \
../../data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R1_001.fastq.gz \
../../data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L002_R2_001.fastq.gz \
../../data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R1_001.fastq.gz \
../../data/neuron_10k_v3_fastqs/neuron_10k_v3_S1_L001_R2_001.fastq.gz
Once the above preprocessing is complete, we now have two different matrices: one tallying the number of reads for each gene that mapped to spliced cDNA isoforms, and one for the reads matching unspliced isoforms. As mentioned above, the bulk of the downstream analysis will be performed on the ‘spliced’ matrix. For each matrix, we also have two accessory files: one corresponding to the gene names for each row, and another with the cell IDs (barcodes) for each column. All of these files can be read into R using the function read_velocity_output() from BUSpaRse:
d <- "./output/neuron10k_velocity/kb/counts_unfiltered" # top-level directory containing requisite files with 'spliced' and 'unspliced' prefixes.
c(spliced, unspliced) %<-% BUSpaRse::read_velocity_output(spliced_dir = d,
spliced_name = "spliced",
unspliced_dir = d,
unspliced_name = "unspliced")
spliced[1:6,1:6]
## 6 x 6 sparse Matrix of class "dgCMatrix"
## AAACCCAAGAAACACT AAACCCAAGAAACCAT AAACCCAAGAAACTCA
## ENSMUSG00000000001.4 . . .
## ENSMUSG00000000003.15 . . .
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
## AAACCCAAGAAAGTCT AAACCCAAGAAATCCA AAACCCAAGAAATTGC
## ENSMUSG00000000001.4 . . .
## ENSMUSG00000000003.15 . . .
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
unspliced[1:6,1:6]
## 6 x 6 sparse Matrix of class "dgCMatrix"
## AAACCCAAGAAACACT AAACCCAAGAAACCCG AAACCCAAGAAACTGT
## ENSMUSG00000000001.4 . . .
## ENSMUSG00000000003.15 . . .
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
## AAACCCAAGAAATCCA AAACCCAAGAAATTGC AAACCCAAGAACAGGA
## ENSMUSG00000000001.4 . . .
## ENSMUSG00000000003.15 . . .
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
What fraction of UMIs are from unspliced transcripts?
sum(unspliced@x) / (sum(unspliced@x) + sum(spliced@x))
## [1] 0.4285891
We expect around 10,000 cells. There are over 10 times more barcodes here, since most barcodes are from empty droplets. The number of genes is constant and represents ‘all annotated genes’ from the Ensembl v97 mouse database.
dim(spliced)
## [1] 55487 1393951
dim(unspliced)
## [1] 55487 1091523
Here rownames are gene identifiers and column names are the unique cell barcode sequences (cell_id).
Most barcodes only have 0 or 1 UMIs detected.
tot_count <- Matrix::colSums(spliced)
summary(tot_count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 1.0 1.0 57.4 1.0 47529.0
Since most of the droplets that are generated by the 10x single cell workflow will not contain cells, we need to identify and remove those ‘empty droplets’ from each matrix. A commonly used method to estimate the number of empty droplets is barcode ranking knee and inflection points, as those are often assumed to represent transitions between two components of a distribution (in this case ‘cells’ vs ‘background’). While more sophisticated methods exist (e.g. see emptyDrops in DropletUtils), for simplicity, we will use the barcode ranking method here. we will use the spliced matrix for filtering, though both matrices have similar inflection points.
Here we calculate the rank order of the cells based on the number of UMIs for each.
bc_rank <- barcodeRanks(spliced)
bc_uns <- barcodeRanks(unspliced)
We can plot number of UMIs on the x axis, and cell barcode rank on the y axis. See this blog post by Lior Pachter for a more detailed explanation.
#' Knee plot for filtering empty droplets
#'
#' Visualizes the inflection point to filter empty droplets. This function plots
#' different datasets with a different color. Facets can be added after calling
#' this function with `facet_*` functions.
#'
#' @param bc_ranks A named list of output from `DropletUtil::barcodeRanks`.
#' @return A ggplot2 object.
#' @importFrom tibble tibble
#' @importFrom purrr map map_dbl
#' @importFrom dplyr distinct
#' @importFrom ggplot2 geom_line geom_hline geom_vline scale_x_log10 scale_y_log10
#' @importFrom tidyr unnest
#' @export
knee_plot <- function(bc_ranks) {
# purrr pluck shorthand doesn't work on S4Vector DataFrame
knee_plt <- tibble(rank = map(bc_ranks, ~ .x[["rank"]]),
total = map(bc_ranks, ~ .x[["total"]]),
dataset = names(bc_ranks)) %>%
unnest(cols = c(rank, total)) %>%
distinct() %>%
dplyr::filter(total > 0)
annot <- tibble(inflection = map_dbl(bc_ranks, ~ metadata(.x)[["inflection"]]),
rank_cutoff = map_dbl(bc_ranks,
~ max(.x$rank[.x$total >
metadata(.x)[["inflection"]]])),
dataset = names(bc_ranks))
p <- ggplot(knee_plt, aes(rank, total, color = dataset)) +
geom_line() +
geom_hline(aes(yintercept = inflection, color = dataset),
data = annot, linetype = 2) +
geom_vline(aes(xintercept = rank_cutoff, color = dataset),
data = annot, linetype = 2) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Rank", y = "Total UMIs")
return(p)
}
knee_plot(list(spliced = bc_rank, unspliced = bc_uns)) +
coord_flip()
Here we can see the knee plot (reading from right to left) and we can identify an inflection point where there is a sharp dropoff in the information content for barcodes. This is (theoretically) the transition between barcodes corresponding to droplets that actually contained cells, vs those that only had a background level of RNA. As expected, it’s roughtly around the 10,000 cell mark which is how many cells we estimated were loaded.
We will use this inflection point (1693 UMIs) to enforce a threshold minimum number of UMIs to filter the barcodes down to only those that contain cells. We can additionally remove any genes with no detectable signal across any cells. We then subset both the spliced and unspliced matrices along both of these dimensions.
bcs_use <- colnames(spliced)[tot_count > metadata(bc_rank)$inflection]
# Remove genes that aren't detected
tot_genes <- Matrix::rowSums(spliced)
genes_use <- rownames(spliced)[tot_genes > 0]
sf <- spliced[genes_use, bcs_use]
uf <- unspliced[genes_use, bcs_use]
So what are we left with?
dim(sf)
## [1] 24825 10391
24825 rows (genes) and 10391 columns (cells)
That concludes the preprocessing and initial filtering to remove empty droplets, now we will need to import the spliced count matrix into a single cell analysis framework for downstream/secondary analysis. Here we will be using the monocle3 framework.
cell_data_set object from spliced matrixThe cell_data_set class defines how the single cell data are stored, indexed, manipulated, and sliced. The three components that we need to create a cell_data_set instance are 1) the sparse count matrix, 2) gene-level annotation, and 3) cell-level annotation. We don’t have much for #’s 2 or 3 at this point other than ids, but it’s enough to get started.
cellMeta<-data.frame("barcode"=colnames(sf))
rownames(cellMeta)<-cellMeta$barcode
geneMeta<-data.frame("gene_id"=rownames(sf))
rownames(geneMeta)<-geneMeta$gene_id
dat <- monocle3::new_cell_data_set(sf,
cell_metadata = cellMeta,
gene_metadata = geneMeta)
## Warning in monocle3::new_cell_data_set(sf, cell_metadata = cellMeta,
## gene_metadata = geneMeta): Warning: gene_metadata must contain a column verbatim
## named 'gene_short_name' for certain functions.
We have now created our base monocle3 cell_data_set object called dat. Lets peek around to see what’s inside:
dat
## class: cell_data_set
## dim: 24825 10391
## metadata(1): cds_version
## assays(1): counts
## rownames(24825): ENSMUSG00000000001.4 ENSMUSG00000020875.9 ...
## ENSMUSG00000118400.1 ENSMUSG00000118435.1
## rowData names(1): gene_id
## colnames(10391): AAACCCACACGCGGTT AAACCCACAGCATACT ... TTTGTTGGTTCAACGT
## TTTGTTGTCCGTTTCG
## colData names(2): barcode Size_Factor
## reducedDimNames(0):
## altExpNames(0):
This is a summary report of dat.
To access the expression matrix we use the exprs() method:
exprs(dat)[1:6,1:6]
## 6 x 6 sparse Matrix of class "dgCMatrix"
## AAACCCACACGCGGTT AAACCCACAGCATACT AAACCCACATACCATG
## ENSMUSG00000000001.4 . . .
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
## ENSMUSG00000000058.6 . . .
## AAACCCAGTCGCACAC AAACCCAGTGCACATT AAACCCAGTGGTAATA
## ENSMUSG00000000001.4 . 1 1
## ENSMUSG00000020875.9 . . .
## ENSMUSG00000000028.15 . . .
## ENSMUSG00000048583.16 . . .
## ENSMUSG00000000049.11 . . .
## ENSMUSG00000000058.6 . . .
To access the gene annotations we use the fData() method:
fData(dat)
## DataFrame with 24825 rows and 1 column
## gene_id
## <character>
## ENSMUSG00000000001.4 ENSMUSG00000000001.4
## ENSMUSG00000020875.9 ENSMUSG00000020875.9
## ENSMUSG00000000028.15 ENSMUSG00000000028.15
## ENSMUSG00000048583.16 ENSMUSG00000048583.16
## ENSMUSG00000000049.11 ENSMUSG00000000049.11
## ... ...
## ENSMUSG00000118398.1 ENSMUSG00000118398.1
## ENSMUSG00000118467.1 ENSMUSG00000118467.1
## ENSMUSG00000118421.1 ENSMUSG00000118421.1
## ENSMUSG00000118400.1 ENSMUSG00000118400.1
## ENSMUSG00000118435.1 ENSMUSG00000118435.1
To access the cell annotations we use the pData() method:
pData(dat)
## DataFrame with 10391 rows and 2 columns
## barcode Size_Factor
## <character> <numeric>
## AAACCCACACGCGGTT AAACCCACACGCGGTT 0.796262
## AAACCCACAGCATACT AAACCCACAGCATACT 0.426575
## AAACCCACATACCATG AAACCCACATACCATG 0.502656
## AAACCCAGTCGCACAC AAACCCAGTCGCACAC 1.271944
## AAACCCAGTGCACATT AAACCCAGTGCACATT 0.495048
## ... ... ...
## TTTGTTGGTAGCTAAA TTTGTTGGTAGCTAAA 1.361858
## TTTGTTGGTATCCCAA TTTGTTGGTATCCCAA 0.579775
## TTTGTTGGTCCGAAAG TTTGTTGGTCCGAAAG 0.345133
## TTTGTTGGTTCAACGT TTTGTTGGTTCAACGT 0.860067
## TTTGTTGTCCGTTTCG TTTGTTGTCCGTTTCG 0.606750
There’s not much annotation in there as of yet, lets see what we can add. ## Add gene-level annotation from BioMaRt Using the gene_id information in the featureData slot, we can fetch external annotations for each gene and merge them so we can get more meaningful gene information. biomaRt is an interface in Bioconductor to get information associated with gene_ids.
mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "mmusculus_gene_ensembl",
host = 'ensembl.org')
t2g <- biomaRt::getBM(attributes = c("ensembl_gene_id",
"external_gene_name",
"chromosome_name",
"start_position",
"end_position"
), mart = mart) # Fetch annotation information for all gene_ids
fData(dat)$gene_id_trimmed<-str_split_fixed(fData(dat)$gene_id,pattern="\\.",2)[,1] #Trim off the version identifier from the gene_ids
fData(dat)<-dplyr::left_join(as.data.frame(fData(dat)),t2g,by=c("gene_id_trimmed" = "ensembl_gene_id"),sort=FALSE,all.x=TRUE,keep=TRUE) # merge annotation into existing fData().
fData(dat)$gene_short_name<-fData(dat)$external_gene_name # make a field named "gene_short_name" in fData()
head(fData(dat)) #Inspect
## DataFrame with 6 rows and 8 columns
## gene_id gene_id_trimmed
## <character> <character>
## ENSMUSG00000000001.4 ENSMUSG00000000001.4 ENSMUSG00000000001
## ENSMUSG00000020875.9 ENSMUSG00000020875.9 ENSMUSG00000020875
## ENSMUSG00000000028.15 ENSMUSG00000000028.15 ENSMUSG00000000028
## ENSMUSG00000048583.16 ENSMUSG00000048583.16 ENSMUSG00000048583
## ENSMUSG00000000049.11 ENSMUSG00000000049.11 ENSMUSG00000000049
## ENSMUSG00000000058.6 ENSMUSG00000000058.6 ENSMUSG00000000058
## ensembl_gene_id external_gene_name chromosome_name
## <character> <character> <character>
## ENSMUSG00000000001.4 ENSMUSG00000000001 Gnai3 3
## ENSMUSG00000020875.9 ENSMUSG00000020875 Hoxb9 11
## ENSMUSG00000000028.15 ENSMUSG00000000028 Cdc45 16
## ENSMUSG00000048583.16 ENSMUSG00000048583 Igf2 7
## ENSMUSG00000000049.11 ENSMUSG00000000049 Apoh 11
## ENSMUSG00000000058.6 ENSMUSG00000000058 Cav2 6
## start_position end_position gene_short_name
## <integer> <integer> <character>
## ENSMUSG00000000001.4 108014596 108053462 Gnai3
## ENSMUSG00000020875.9 96162283 96167421 Hoxb9
## ENSMUSG00000000028.15 18599197 18630737 Cdc45
## ENSMUSG00000048583.16 142204503 142220553 Igf2
## ENSMUSG00000000049.11 108234180 108305222 Apoh
## ENSMUSG00000000058.6 17281184 17289114 Cav2
Now we have a bit more useful information associated with each gene.
Before we move on to exploration and annotation of the data, we first need to get some summary statistics such as a scaling factor and an estimate of the dispersion for each gene (variance in excess of what is expected for a given model fit).
The dispersion estimate is a useful metric to identify genes that have higher variance than expected. Per-gene technical variance in scRNA-Seq data is well modeled by a negative binomial fit. Genes with excess variation (overdispersion) should have high biological variation above and beyond the modeled technical variance.
dat<-m3addon::calculate_gene_dispersion(dat,id_tag="gene_id")
# Select genes with high dispersion relative to fit
dat<-m3addon::select_genes(dat)
m3addon::plot_gene_dispersion(dat) + scale_color_manual(values=c("black","red"))
This plot shows how the variance in gene expression measures changes as a function of the mean expression level for each gene. The fit line is a negative binomial approximation that corresponds to the ‘technical variance’ expected at all levels of expression. Genes that have a high residual to this fit have more variance than expected and that may represent additional ‘biological variance’. Subsetting to these genes can help improve downstream analyses by focusing on genes that better segregate different cell types or states.
The term ‘preprocessing’ comes up again even though we’re past the initial hurdle of generating the matrix. Standard secondary preprocessing for single cell RNA-Seq involves projecting expression data into the top principal components to identify ranked sources of variation. This is usually done after log-transforming the data to stabilize the variance across the dynamic range of gene expression. Monocle3 conveniently provides a function preprocess_cds() that will do this transform and PCA analysis. We start with a relatively high number of principal components to estimate, 100.
dat <- preprocess_cds(dat,
num_dim = 100,
method = "PCA",
norm_method = "log",
verbose=T,
cores=4)
## Remove noise by PCA ...
plot_pc_variance_explained(dat)
If we look at the variance explained for each principle component, we can see that it starts to trail off after a point. We probably won’t get much more useful information after 40 or so components. So we subset to only the first 40 components and preprocess again.
nDims <- 20 # or 20 works here too downstream
dat <- preprocess_cds(dat,
num_dim = nDims,
method = "PCA",
norm_method = "log",
verbose=T,
cores=4)
## Remove noise by PCA ...
Next we can actually start to assess some of the important quality metrices for each gene and each cell.
It’s a good idea, and saves time/effort to identify and only consider genes whose expression levels are detected in a certain number or proportion of cells within your assay.
dat<-detect_genes(dat)
cellCutoff<-20 # This number is arbitrary
expressed_genes <- row.names(subset(fData(dat),
num_cells_expressed >= cellCutoff))
length(expressed_genes)
## [1] 15365
Once we’ve detected the number of cells expressing each gene, we can look at the distribution to get a better feel
hist(fData(dat)$num_cells_expressed,col="red",breaks=50,main="Number of cells expressing a given gene")
Most genes are not detectably expressed in more than one cell (this is the nature of single cell gene expression assays, and gene expression in general)
Lets log transform and look again. This time, we’ll add a threshold line showing our cutoff for ‘expressed genes’.
hist(log10(fData(dat)$num_cells_expressed),col="red",breaks=50,main="log10 Number of cells expressing a given gene")
abline(v=log10(cellCutoff),lty="dashed")
We have now identified a total of 15365 that are detectably expressed in at least 20 cells in our dataset.
What is the average expression level (in mRNA Copies per cell) for each gene?
fData(dat)$mean_cpc<-Matrix::rowMeans(exprs(dat))
hist(log10(fData(dat[expressed_genes,])$mean_cpc),col="purple",breaks=50,main="Mean RNA copies per cell")
How many genes are expressed in a given cell?
hist(pData(dat)$num_genes_expressed,col="darkgreen",breaks=50,main="Number of genes expressed per cell")
A high proportion of mitochondrial genes may indicate a lower than ideal capture efficiency for a given cell. Here we identify the subset of mitochondrial genes and look at the proportion of reads mapping to ‘mt-’ genes vs genomic genes.
mito_genes<-fData(dat)$gene_id[grepl("^mt-",fData(dat)$gene_short_name)]
pData(dat)$mt_reads <- Matrix::colSums(exprs(dat)[mito_genes,])
pData(dat)$total_reads <- Matrix::colSums(exprs(dat))
pData(dat)$mito_ratio <- pData(dat)$mt_reads/pData(dat)$total_reads
ggplot(as.data.frame(pData(dat)),
aes(x = num_genes_expressed, y = mito_ratio)) +
geom_point() +
labs(x = "Number of genes", y = "Mitochondrial ratio") +
scale_color_brewer(palette = "Set1") +
theme(legend.position = "none") +
ggtitle("Number of genes vs Mitochondrial ratio") +
monocle3:::monocle_theme_opts()
To get a general picture of the capture efficiency and depth of information for each cell we can look at the total mRNA mass recovered per cell.
pData(dat)$Total_mRNA<-Matrix::colSums(exprs(dat))
hist(pData(dat)$Total_mRNA,col="darkblue",breaks=50,main="Total mRNAs sequenced per cell")
Bonus question: How many mRNAs do we expect are in a given eukaryotic cell?
For each of the above QC ‘criterion’ we can define thresholds that can be used to filter cells/genes to improve the quality of the dataset. Here is usually where obvious doublet cells (more than one cell is associated with a single barcode sequence) or low-quality cells are removed prior to doing any further statistical interpretations.
Now we’re ready to visualize the cells. To do so, you can use either t-SNE, which is very popular in single-cell RNA-seq, or UMAP, which is increasingly common. Monocle 3 uses UMAP by default, since it is both faster and better suited for clustering and trajectory analysis in RNA-seq. To reduce the dimensionality of the data down into the X, Y plane so we can plot it easily, we call reduce_dimension():
dat <- reduce_dimension(dat,
verbose=TRUE,
reduction_method="UMAP",
cores = 4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
## Running Uniform Manifold Approximation and Projection
## 22:38:41 UMAP embedding parameters a = 1.577 b = 0.8951
## 22:38:41 Read 10391 rows and found 20 numeric columns
## 22:38:42 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:38:43 Writing NN index file to temp file /var/folders/rn/f0sgm51j3wqf9p3my_z549c80000gn/T//RtmpJ0wEZG/file15cba21a9aea8
## 22:38:43 Searching Annoy index using 4 threads, search_k = 1500
## 22:38:43 Annoy recall = 100%
## 22:38:44 Commencing smooth kNN distance calibration using 4 threads
## 22:38:46 Initializing from normalized Laplacian + noise
## 22:38:46 Commencing optimization for 200 epochs, with 206332 positive edges
## 22:38:50 Optimization finished
To visualize the dimensionality reduction we use plot_cells():
plot_cells(dat)
## No trajectory to plot. Has learn_graph() been called yet?
## cluster not found in colData(cds), cells will not be colored
## cluster_cells() has not been called yet, can't color cells by cluster
And we can look at how different features of the cells are distributed across this embedding. For now we only have technical features to view.
plot_cells(dat,color_cells_by="num_genes_expressed",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.
plot_cells(dat,color_cells_by="Total_mRNA",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.
plot_cells(dat,color_cells_by="mito_ratio",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
## Cells aren't colored in a way that allows them to be grouped.
What can we deduce/hypothesize about the embedding (we will formally test later)? Number of cell types? Diversity of cell types?
We can also map the expression level of individual genes (markers) onto this embedding to start to parse out meaning.
plot_cells(dat,genes=c("Sox9","Gad1","Slc17a6","Slc17a7"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
plot_cells(dat,genes=c("Pax6","Eomes","Fezf2","Tle4","Satb2","Pou3f2"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
We next want to impose a clustering solution onto this embedding to group cells with similar transcriptional profiles together. We use cluster_cells() to perform this function in monocle3:
dat <- cluster_cells(dat,
verbose=TRUE,
resolution=1e-4)
## Running leiden clustering algorithm ...
## Run kNN based graph clustering starts:
## -Input data of 10391 rows and 2 columns
## -k is set to 20
## Finding nearest neighbors...
## DONE. Run time: 0.0660000000000025s
## Compute jaccard coefficient between nearest-neighbor sets ...
## DONE. Run time: 0.0029999999999859s
## Build undirected graph from the weighted links ...
## DONE. Run time: 0.180000000000007s
## Run leiden clustering ...
## leiden_find_partition: partition_type CPMVertexPartition
## leiden_find_partition: seed 1134163607
## leiden_find_partition: resolution_parameter 1e-04
## leiden_find_partition: num_iter 2
## leiden_find_partition: initial_membership 0
## leiden_find_partition: edge_weights 0
## leiden_find_partition: node_sizes 0
## leiden_find_partition: number vertices 10391
## leiden_find_partition: number edges 207820
## Current resolution is 1e-04; Modularity is 0.78842625910158; Quality is 412334.2366; Significance is 209671.374526695; Number of clusters is 12
## Done. Run time: 0.44479513168335s
## Clustering statistics
## resolution_parameter quality modularity significance cluster_count selected
## 1e-04 412334.2 0.7884263 209671.4 12 *
##
## Cell counts by cluster
## cluster cell_count cell_fraction
## 1 3388 0.326
## 2 2380 0.229
## 3 1506 0.145
## 4 1464 0.141
## 5 957 0.092
## 6 245 0.024
## 7 107 0.010
## 8 107 0.010
## 9 76 0.007
## 10 73 0.007
## 11 44 0.004
## 12 44 0.004
##
## Maximal modularity is 0.78842625910158 for resolution parameter 1e-04
##
## Run kNN based graph clustering DONE.
## -Number of clusters: 12
plot_cells(dat, color_cells_by="cluster", cell_size=0.75, group_label_size = 5, show_trajectory_graph=FALSE)
## Warning: The `add` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## Please use the `.add` argument instead.
Remember, clustering is a useful tool but is also arbitrary for continuous cell-cell transitions.
As a crude first-pass annotation, we can leverage publicly available, bulk RNA-Seq gene expression information for specific cell types to try and ‘learn’ cell type annotations for each cell. This ‘transfer learning’ approach can provide a resonable starting point for coarse cell type identification. The SingleR package leverages reference transcriptomic datasets of pure cell types to infer the cell of origin of each of the single cells independently. First we need to fetch a reference dataset of bulk RNA-Seq expression profiles.
mouse.rnaseq <- celldex::MouseRNAseqData(ensembl = TRUE)
## snapshotDate(): 2020-10-27
## see ?celldex and browseVignettes('celldex') for documentation
## loading from cache
## see ?celldex and browseVignettes('celldex') for documentation
## loading from cache
## snapshotDate(): 2020-10-27
## loading from cache
## Warning: Unable to map 2180 of 21214 requested IDs.
And then, we can use this dataset to compare our single cell expression profiles against to apply coarse cell type labels.
dat.exprs<-exprs(dat[expressed_genes,])
rownames(dat.exprs) <- str_remove(rownames(dat.exprs), "\\.\\d+")
system.time(annots<-SingleR(dat.exprs,
ref = mouse.rnaseq, labels = colData(mouse.rnaseq)$label.fine,
de.method = "classic", method = "single", BPPARAM = BiocParallel::MulticoreParam(4))
)
Once we’ve learned labels for each cell, we can then add this information into the phenotypeData (pData()) slot of our main object.
pData(dat)$cell_type <- annots$pruned.labels
plot_cells(dat,color_cells_by="cell_type", group_cells_by = "cluster", cell_size=0.75, label_cell_groups = FALSE)
## No trajectory to plot. Has learn_graph() been called yet?
table(pData(dat)$cell_type)
##
## aNSCs Astrocytes B cells
## 195 2 2
## Endothelial cells Ependymal Erythrocytes
## 84 2 140
## Fibroblasts Fibroblasts activated Fibroblasts senescent
## 16 27 10
## Granulocytes Macrophages Macrophages activated
## 1 9 3
## Microglia Monocytes Neurons
## 19 4 1364
## NPCs Oligodendrocytes OPCs
## 8207 1 200
## qNSCs
## 13
The algorithm does a reasonable job at identifying major cell types, and identifies a number of different cell types in our dataset. Before we go any further, lets filter the dataset down to only cell types that we might be interested in for downstream analysis.
inds <- annots$pruned.labels %in% c("NPCs", "Neurons", "OPCs", "Oligodendrocytes",
"qNSCs", "aNSCs", "Astrocytes", "Ependymal")
# Only keep these cell types
cells_use <- row.names(annots)[inds]
dat <- dat[,cells_use]
table(pData(dat)$cell_type)
##
## aNSCs Astrocytes Ependymal Neurons
## 195 2 2 1364
## NPCs Oligodendrocytes OPCs qNSCs
## 8207 1 200 13
# Preprocess again on filtered dataset
nDims <- 20
dat <- preprocess_cds(dat,
num_dim = nDims,
method = "PCA",
verbose=TRUE)
## Remove noise by PCA ...
dat <- reduce_dimension(dat,
verbose=TRUE,
reduction_method="UMAP",
umap.n_neighbors=20,
cores = 4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
## Running Uniform Manifold Approximation and Projection
## 22:39:26 UMAP embedding parameters a = 1.577 b = 0.8951
## 22:39:26 Read 9984 rows and found 20 numeric columns
## 22:39:26 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:39:27 Writing NN index file to temp file /var/folders/rn/f0sgm51j3wqf9p3my_z549c80000gn/T//RtmpJ0wEZG/file15cba6d228815
## 22:39:27 Searching Annoy index using 4 threads, search_k = 2000
## 22:39:28 Annoy recall = 100%
## 22:39:28 Commencing smooth kNN distance calibration using 4 threads
## 22:39:30 Initializing from normalized Laplacian + noise
## 22:39:30 Commencing optimization for 500 epochs, with 265962 positive edges
## 22:39:40 Optimization finished
dat <- cluster_cells(dat,
verbose=TRUE,
resolution=1e-5)
## Running leiden clustering algorithm ...
## Run kNN based graph clustering starts:
## -Input data of 9984 rows and 2 columns
## -k is set to 20
## Finding nearest neighbors...
## DONE. Run time: 0.0289999999999964s
## Compute jaccard coefficient between nearest-neighbor sets ...
## DONE. Run time: 0.00100000000000477s
## Build undirected graph from the weighted links ...
## DONE. Run time: 0.0420000000000016s
## Run leiden clustering ...
## leiden_find_partition: partition_type CPMVertexPartition
## leiden_find_partition: seed 1690597741
## leiden_find_partition: resolution_parameter 1e-05
## leiden_find_partition: num_iter 2
## leiden_find_partition: initial_membership 0
## leiden_find_partition: edge_weights 0
## leiden_find_partition: node_sizes 0
## leiden_find_partition: number vertices 9984
## leiden_find_partition: number edges 199680
## Current resolution is 1e-05; Modularity is 0.479051115749542; Quality is 398820.8504; Significance is 83831.7474052163; Number of clusters is 5
## Done. Run time: 0.322535037994385s
##
## Clustering statistics
## resolution_parameter quality modularity significance cluster_count selected
## 1e-05 398820.9 0.4790511 83831.75 5 *
##
## Cell counts by cluster
## cluster cell_count cell_fraction
## 1 6898 0.691
## 2 1505 0.151
## 3 1437 0.144
## 4 105 0.011
## 5 39 0.004
##
## Maximal modularity is 0.479051115749542 for resolution parameter 1e-05
##
## Run kNN based graph clustering DONE.
## -Number of clusters: 5
plot_cells(dat,color_cells_by="cell_type", group_cells_by = "cluster", cell_size=0.75, label_cell_groups = FALSE) + scale_color_brewer(palette="Set1")
## No trajectory to plot. Has learn_graph() been called yet?
plot_cells(dat,color_cells_by="cluster",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
plot_cells(dat,color_cells_by="partition",cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
plot_cells(dat,genes=c("Pax6","Eomes","Fezf2","Tle4","Satb2","Pou3f2"),cell_size=0.75)
## No trajectory to plot. Has learn_graph() been called yet?
One of our objectives is to find marker genes expressed by each cluster/cell type. Once cells have been clustered, we can ask what genes makes them different from one another. To do that, start by calling the top_markers() function:
marker_test_res <- top_markers(dat, group_cells_by="cluster",
reference_cells=500, cores=4)
top_specific_markers <- marker_test_res %>%
dplyr::filter(fraction_expressing >= 0.30) %>%
group_by(cell_group) %>%
top_n(5, pseudo_R2)
top_specific_marker_ids <- unique(top_specific_markers %>% pull(gene_id))
plot_genes_by_group(dat,
top_specific_marker_ids,
group_cells_by="cluster",
ordering_type="maximal_on_diag",
max.size=4)
## Warning in if (axis_order == "marker_group") {: the condition has length > 1 and
## only the first element will be used
plot_cells(dat,color_cells_by="cluster",cell_size=0.75, group_label_size = 8)
## No trajectory to plot. Has learn_graph() been called yet?
Pseudotime is a measure of how much progress an individual cell has made through a process such as cell differentiation. Since we have at least one population of cells that is transitioning from a progenitor state to a series of mature neurons, can we identify potential differentiation trajectories?
First we must learn a ‘trajectory graph’ across each partition (contiguous group of cells) in the data.
dat <- learn_graph(dat)
##
|
| | 0%
|
|======================================================================| 100%
##
|
| | 0%
|
|======================================================================| 100%
plot_cells(dat,
color_cells_by = "cell_type",
label_groups_by_cluster=FALSE,
label_leaves=TRUE,
label_branch_points=TRUE,
label_cell_groups = FALSE,
graph_label_size=3,
cell_size=0.75) + scale_color_brewer(palette="Set1")
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## Please use `select()` instead.
dat <- order_cells(dat,
root_pr_nodes="Y_54")
plot_cells(dat,
color_cells_by = "pseudotime",
label_cell_groups=FALSE,
label_leaves=FALSE,
label_branch_points=FALSE,
graph_label_size=3,
cell_size=0.75)
Here we are also setting the ‘root’ of this trajectory to begin at a node in the NPC cells.
How do we find the genes that are differentially expressed on the different paths through the trajectory? How do we find the ones that are restricted to the beginning of the trajectory? Or excluded from it? Monocle3 uses a ‘principal graph test’ to test whether cells at similar positions on the trajectory have correlated expression
system.time(pseudotime_pr_test_res<-graph_test(dat,
neighbor_graph="principal_graph",
cores=4)
)
pr_deg_ids <- row.names(subset(pseudotime_pr_test_res, q_value < 1e-50))
Here are some high-scoring differentially expressed genes along the pseudotime trajectory in their UMAP embedding
pseudotime_genes <- c("Pax6","Btg2","Snap25","Fezf2","Hes6","Cux1")
#pseudotime_genes <- head(pseudotime_pr_test_res[order(pseudotime_pr_test_res$q_value),])$external_gene_name
plot_cells(dat, genes=pseudotime_genes,
show_trajectory_graph=TRUE,
label_cell_groups=FALSE,
label_leaves=FALSE,
cell_size=0.5)
We can also explicitly look at how the expression of these genes along the pseudotime trajectory. The function plot_genes_in_pseudotime() takes a small set of genes and shows you their dynamics as a function of pseudotime:
pseudotime_lineage_cds <- dat[fData(dat)$gene_short_name %in% pseudotime_genes,
pData(dat)$cell_type %in% c("NPCs","Neurons")]
plot_genes_in_pseudotime(pseudotime_lineage_cds,
color_cells_by="cell_type",
min_expr=0.5,
ncol=2)
dat_3d <- reduce_dimension(dat, max_components = 3, cores=4)
## No preprocess_method specified, using preprocess_method = 'PCA'
## Note: reduce_dimension will produce slightly different output each time you run it unless you set 'umap.fast_sgd = FALSE' and 'cores = 1'
dat_3d <- cluster_cells(dat_3d,resolution = 1e-5)
dat_3d <- learn_graph(dat_3d)
##
|
| | 0%
|
|======================================================================| 100%
##
|
| | 0%
|
|======================================================================| 100%
##
|
| | 0%
|
|======================================================================| 100%
#dat_3d <- order_cells(dat_3d)
plot_cells_3d(dat_3d, color_cells_by="cell_type")
Finally, we can step beyond marker gene analysis, and use some ‘latent space’ discovery methods to learn patterns of co-regulated gene expression.
These patterns can be used to identify/define: * Cell type identities * Biological processes * Spatial gradients * Other cellular features
… often with significantly greater resolution and precision than marker genes.
Importantly, these patterns are ‘data driven’ and often yield insights into the heterogeneity and complexity of a given dataset that were unanticipated.
nPatterns<-30
# Takes about ~3-4 minutes...
system.time(dat.nnmf<-NNLM::nnmf(as.matrix(log10(exprs(dat)[expressed_genes,]+1)), k = nPatterns,
n.threads = 6, trace = 10, verbose = TRUE, show.warning = TRUE))
## user system elapsed
## 445.946 23.597 470.677
# Gene x Pattern matrix
dim(dat.nnmf$W)
## [1] 15365 30
# Pattern x Cell matrix
dim(dat.nnmf$H)
## [1] 30 9984
#Add patterns to phenotype data for visualization
patterns.df<-data.frame(t(dat.nnmf$H))
colnames(patterns.df)<-paste0("Pattern_",c(1:nPatterns))
pData(dat)<-cbind(pData(dat),patterns.df)
pdf("figures/patterns.pdf",width=10,height=10)
lapply(c(1:nPatterns),function(i){
plot_cells(dat, color_cells_by=paste0("Pattern_",i), cell_size=0.75) +
ggtitle(paste0("Pattern_",i)) +
coord_equal(1)
})
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
##
## [[20]]
##
## [[21]]
##
## [[22]]
##
## [[23]]
##
## [[24]]
##
## [[25]]
##
## [[26]]
##
## [[27]]
##
## [[28]]
##
## [[29]]
##
## [[30]]
dev.off()
## quartz_off_screen
## 2
targetPattern<-10
plot_cells(dat, color_cells_by=paste0("Pattern_",targetPattern), cell_size=0.75) +
ggtitle(paste0("Pattern ",targetPattern)) +
coord_equal(1)
geneWeights.df<-data.frame(dat.nnmf$W)
colnames(geneWeights.df)<-paste0("Pattern_",c(1:nPatterns))
#fData(dat)[head(rownames(dat.nnmf$W)[order(dat.nnmf$W[,patternOfInterest],decreasing=TRUE)]),]
tmp<-as.data.frame(merge(fData(dat)[,c("gene_id","gene_short_name")],geneWeights.df,by=0))
targetPatterns<-c(5,9,10,23,targetPattern)
DT::datatable(tmp[,c("gene_id","gene_short_name",
unlist(lapply(targetPatterns,function(i){paste0("Pattern_",i)
}))
)])
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
Finally, we can revist the RNA velocity analysis to make predictions about how cells are traveling through the UMAP embedding. By estimating the potential turnover rates for individual genes, we can then make predictions about where each gene:cell is in this process. By aggregating this information for a given cell, and projecting into the UMAP embedding, we can make an estimate of ‘where’ each cell will be in this embedding in about an hour. This helps us understand the flow of cells through these manifold embeddings.
cell.dist <- as.dist(1-armaCor(t(dat@reducedDims$PCA))) ### TODO: point to cell_data_set PCA projection to calculate cell-cell distance
fit.quantile <- 0.02
system.time(rvel.cd <- gene.relative.velocity.estimates(sf[,cells_use],uf[,cells_use],deltaT=1,kCells=25,fit.quantile=fit.quantile, cell.dist=cell.dist, n.cores=7)) ### TODO: Add cell.dist to velocity estimates
emb <- dat@reducedDims$UMAP
nCols<-length(unique(pData(dat)$cell_type))
cell.colors<-rainbow(nCols)[as.factor(pData(dat)$cell_type)]
names(cell.colors)<-colnames(sf[,cells_use])
system.time(velocity_embedding<-show.velocity.on.embedding.cor(emb,
rvel.cd,
n=200,
scale='sqrt',
cell.colors=ac(cell.colors,alpha=0.5),
cex=0.8,
arrow.scale=3,
show.grid.flow=TRUE,
min.grid.cell.mass=0.5,
grid.n=40,
arrow.lwd=1
,do.par=F,
cell.border.alpha = 0.1,
n.cores=6,
return.details=TRUE)
)
gene <- "ENSMUSG00000027210.20" ## Meis2
gene.relative.velocity.estimates(sf[,cells_use],uf[,cells_use],deltaT=1,kCells = 25,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=cell.colors,cell.dist=cell.dist,show.gene=gene,old.fit=rvel.cd,do.par=T)
gene <- "ENSMUSG00000027168.21" ## Pax6
gene.relative.velocity.estimates(sf[,cells_use],uf[,cells_use],deltaT=1,kCells = 25,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=ac(cell.colors,alpha=0.5),show.gene=gene,old.fit=rvel.cd,do.par=T)
gene <- "ENSMUSG00000021743.6"
gene.relative.velocity.estimates(sf[,cells_use],uf[,cells_use],deltaT=1,kCells = 100,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=ac(cell.colors,alpha=0.5),show.gene=gene,old.fit=rvel.cd,do.par=T)
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ensembldb_2.14.0 AnnotationFilter_1.14.0
## [3] GenomicFeatures_1.42.2 AnnotationDbi_1.52.0
## [5] monocle3_0.2.3.0 NNLM_0.4.4
## [7] celldex_1.0.0 SingleR_1.4.1
## [9] DropletUtils_1.10.3 SingleCellExperiment_1.12.0
## [11] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [13] MatrixGenerics_1.2.1 matrixStats_0.58.0
## [15] zeallot_0.1.0 scales_1.1.1
## [17] BUSpaRse_1.5.3 velocyto.R_0.6
## [19] Matrix_1.3-2 forcats_0.5.1
## [21] stringr_1.4.0 dplyr_1.0.5
## [23] purrr_0.3.4 readr_1.4.0
## [25] tidyr_1.1.3 tibble_3.1.0
## [27] ggplot2_3.3.3 tidyverse_1.3.0
## [29] BSgenome.Mmusculus.UCSC.mm10_1.4.0 BSgenome_1.58.0
## [31] rtracklayer_1.50.0 Biostrings_2.58.0
## [33] XVector_0.30.0 GenomicRanges_1.42.0
## [35] GenomeInfoDb_1.26.4 IRanges_2.24.1
## [37] S4Vectors_0.28.1 biomaRt_2.46.3
## [39] AnnotationHub_2.22.0 BiocFileCache_1.14.0
## [41] dbplyr_2.1.0 BiocGenerics_0.36.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 R.methodsS3_1.8.1
## [3] coda_0.19-4 bit64_4.0.5
## [5] knitr_1.31 irlba_2.3.3
## [7] DelayedArray_0.16.2 R.utils_2.10.1
## [9] data.table_1.14.0 RCurl_1.98-1.2
## [11] generics_0.1.0 leidenbase_0.1.2
## [13] RhpcBLASctl_0.20-137 RSQLite_2.2.4
## [15] RANN_2.6.1 proxy_0.4-25
## [17] bit_4.0.4 xml2_1.3.2
## [19] lubridate_1.7.10 httpuv_1.5.5
## [21] assertthat_0.2.1 viridis_0.5.1
## [23] xfun_0.22 hms_1.0.0
## [25] jquerylib_0.1.3 evaluate_0.14
## [27] promises_1.2.0.1 fansi_0.4.2
## [29] progress_1.2.2 readxl_1.3.1
## [31] htmlwidgets_1.5.3 igraph_1.2.6
## [33] DBI_1.1.1 spdep_1.1-5
## [35] ellipsis_0.3.1 crosstalk_1.1.1
## [37] RSpectra_0.16-0 backports_1.2.1
## [39] deldir_0.2-10 sparseMatrixStats_1.2.1
## [41] vctrs_0.3.6 cachem_1.0.4
## [43] withr_2.4.1 grr_0.9.5
## [45] GenomicAlignments_1.26.0 prettyunits_1.1.1
## [47] cluster_2.1.1 ExperimentHub_1.16.0
## [49] lazyeval_0.2.2 crayon_1.4.1
## [51] edgeR_3.32.1 pkgconfig_2.0.3
## [53] slam_0.1-48 labeling_0.4.2
## [55] units_0.7-0 nlme_3.1-152
## [57] ProtGenerics_1.22.0 rlang_0.4.10
## [59] lifecycle_1.0.0 modelr_0.1.8
## [61] rsvd_1.0.3 cellranger_1.1.0
## [63] raster_3.4-5 Rhdf5lib_1.12.1
## [65] boot_1.3-27 Matrix.utils_0.9.8
## [67] reprex_1.0.0 viridisLite_0.3.0
## [69] bitops_1.0-6 R.oo_1.24.0
## [71] KernSmooth_2.23-18 rhdf5filters_1.2.0
## [73] speedglm_0.3-3 blob_1.2.1
## [75] DelayedMatrixStats_1.12.3 classInt_0.4-3
## [77] beachmat_2.6.4 memoise_2.0.0
## [79] magrittr_2.0.1 plyr_1.8.6
## [81] gdata_2.18.0 zlibbioc_1.36.0
## [83] compiler_4.0.4 dqrng_0.2.1
## [85] RColorBrewer_1.1-2 pcaMethods_1.82.0
## [87] Rsamtools_2.6.0 cli_2.3.1
## [89] LearnBayes_2.15.1 MASS_7.3-53.1
## [91] mgcv_1.8-34 tidyselect_1.1.0
## [93] stringi_1.5.3 highr_0.8
## [95] yaml_2.2.1 BiocSingular_1.6.0
## [97] askpass_1.1 locfit_1.5-9.4
## [99] ggrepel_0.9.1 pbmcapply_1.5.0
## [101] grid_4.0.4 sass_0.3.1
## [103] tools_4.0.4 rstudioapi_0.13
## [105] gridExtra_2.3 plyranges_1.10.0
## [107] farver_2.1.0 digest_0.6.27
## [109] BiocManager_1.30.10 shiny_1.6.0
## [111] Rcpp_1.0.6 broom_0.7.5
## [113] scuttle_1.0.4 BiocVersion_3.12.0
## [115] later_1.1.0.1 RcppAnnoy_0.0.18
## [117] httr_1.4.2 sf_0.9-7
## [119] colorspace_2.0-0 rvest_1.0.0
## [121] XML_3.99-0.5 fs_1.5.0
## [123] splines_4.0.4 uwot_0.1.10
## [125] expm_0.999-6 sp_1.4-5
## [127] plotly_4.9.3 spData_0.3.8
## [129] xtable_1.8-4 jsonlite_1.7.2
## [131] R6_2.5.0 gmodels_2.18.1
## [133] pillar_1.5.1 htmltools_0.5.1.1
## [135] mime_0.10 DT_0.17
## [137] glue_1.4.2 fastmap_1.1.0
## [139] BiocParallel_1.24.1 BiocNeighbors_1.8.2
## [141] class_7.3-18 interactiveDisplayBase_1.28.0
## [143] codetools_0.2-18 utf8_1.2.1
## [145] lattice_0.20-41 bslib_0.2.4
## [147] curl_4.3 gtools_3.8.2
## [149] openssl_1.4.3 limma_3.46.0
## [151] rmarkdown_2.7 munsell_0.5.0
## [153] e1071_1.7-5 rhdf5_2.34.0
## [155] GenomeInfoDbData_1.2.4 HDF5Array_1.18.1
## [157] haven_2.3.1 reshape2_1.4.4
## [159] gtable_0.3.0
With caches
knitEnd <- Sys.time()
knitEnd - knitStart
## Time difference of 26.57299 mins